AITopics | github repository

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Neural Information Processing SystemsJun-23-2026, 00:42:03 GMT

We introduce FreshStack, a holistic framework for automatically building information retrieval (IR) evaluation benchmarks by incorporating challenging questions and answers. FreshStack conducts the following steps: (1) automatic corpus collection from code and technical documentation, (2) nugget generation from community-asked questions and answers, and (3) nugget-level support, retrieving documents using a fusion of retrieval techniques and hybrid architectures. We use FreshStack to build five datasets on fast-growing, recent, and niche domains to ensure the tasks are sufficiently challenging. On FreshStack, existing retrieval models, when applied out-of-the-box, significantly underperform oracle approaches on all five domains, denoting plenty of headroom to improve IR quality. In addition, we identify cases where rerankers do not improve first-stage retrieval accuracy (two out of five domains) and oracle context helps an LLM generator generate a high-quality RAG answer. We hope FreshStack will facilitate future work toward constructing realistic, scalable, and uncontaminated IR and RAG evaluation benchmarks.

information retrieval, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > Maryland (0.28)
North America > Canada > British Columbia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)
Research Report > New Finding (0.67)
Workflow (0.66)

Industry: Information Technology (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Overleaf Example

Neural Information Processing SystemsJun-21-2026, 09:07:01 GMT

Machine unlearning algorithms aim to efficiently remove data from a model without retraining it from scratch, in order to remove corrupted or outdated data or respect a user's "right to be forgotten." Certified machine unlearning is a strong theoretical guarantee based on differential privacy that quantifies the extent to which an algorithm erases data from the model weights. In contrast to existing works in certified unlearning for convex or strongly convex loss functions, or nonconvex objectives with limiting assumptions, we propose the first, first-order, black-box (i.e., can be applied to models pretrained with vanilla gradient descent) algorithm for unlearning on general nonconvex loss functions, which unlearns by "rewinding" to an earlier step during the learning process before performing gradient descent on the loss function of the retained data points. We prove (ϵ,δ) certified unlearning and performance guarantees that establish the privacy-utility-complexity tradeoff of our algorithm, and we prove generalization guarantees for functions that satisfy the Polyak-Lojasiewicz inequality. Finally, we demonstrate the superior performance of our algorithm compared to existing methods, within a new experimental framework that more accurately reflects unlearning user data in practice.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)
Overview (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SWE-smith: Scaling Data for Software Engineering Agents

Neural Information Processing SystemsJun-13-2026, 05:31:01 GMT

Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to requesting a name change in the electronic proceedings. Use the Report an Issue link to request a name change.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

Scorio.jl: A Julia package for ranking stochastic responses

Hariri, Mohsen, Hinczewski, Michael, Chaudhary, Vipin

arXiv.org Machine LearningMar-17-2026

Scorio.jl is a Julia package for evaluating and ranking systems from repeated responses to shared tasks. It provides a common tensor-based interface for direct score-based, pairwise, psychometric, voting, graph, and listwise methods, so the same benchmark can be analyzed under multiple ranking assumptions. We describe the package design, position it relative to existing Julia tools, and report pilot experiments on synthetic rank recovery, stability under limited trials, and runtime scaling.

artificial intelligence, machine learning, scorio, (18 more...)

arXiv.org Machine Learning

2603.14103

Country:

North America > United States > Ohio > Cuyahoga County > Cleveland (0.05)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing Jianfei Y ang 1, He Huang 1, Y unjiao Zhou

Neural Information Processing SystemsFeb-19-2026, 07:25:03 GMT

MA TLAB, as shown in Table 2. To enhance the sensing quality, we have aggregated five adjacent frames into a new frame for use. WiFi CSI data, there are some "-inf" values in some sequences. The "-inf" number comes from the To facilitate the users, we have embedded these processing codes into our dataset tool. When the user loads our WiFi CSI data, these numbers will be handled by linear interpolation. As presented in Section 4.3, we provide the temporal Each sequence is annotated by at least 5 human annotators.

artificial intelligence, machine learning, modality, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.04)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

f26b29298ae8acd94bd7e839688e329b-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 16:43:52 GMT

artificial intelligence, dataset, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Michigan (0.05)

Industry:

Information Technology (0.69)
Law (0.69)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.96)

Add feedback

e607b1419e9ae7cd5cb5b5bb60c2ad5c-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 12:15:03 GMT

artificial intelligence, information management, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.06)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.94)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Information Management (0.94)
Information Technology > Security & Privacy (0.69)

Add feedback

A Additional Results

Neural Information Processing SystemsFeb-17-2026, 10:09:08 GMT

The acronym dataset is a QA task that requires models to decode financial acronyms. The FinMA7B-full model achieved the highest ROUGE-1 score of 0.12 and the B.1 Why was the datasheet created? B.2 Has the dataset been used already? If so, where are the results so others can compare (e.g., links to published papers)? Y es, the dataset has already been used. It was employed in the FinLLM Share Task during the FinNLP-AgentScen Workshop at IJCAI 2024, known as the FinLLM Challenge.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: